[FEATURE] Improve EncryptorImpl with Asynchronous Handling for Scalability #3919

akolarkunnu · 2025-06-17T12:22:17Z

Description

Removed the usage of ContDownLatch.
Every requests will be submitted and returns the Future.
Added a list to track the ongoing master key generation. If any tenant id is in the list, then it's key generation is on going and it will wait until other thread completes the key generation. Same time system will accept other requests, if key is already available in the map that will proceed otherwise key generation for new tenant will start in different thread. So, multiple tenants key generation can happen simultaneously.

Related Issues

Resolves #3510

Check List

New functionality includes testing.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ility Removed the usage of ContDownLatch. Every requets will be submitted and returns the Future. Added a list to track the ongoing master key generation. If any tenant id is in the list, then it's key generation is on going and it will wait until other thread completes the key genearion. Same time system will accept other requests, if key is already avaialble in the map that will procced otherwise key generation for new tenant will start in different thread. So, multiple tenants key generation can happen simulatneuosly. Resolves opensearch-project#3510 Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>

dhrubo-os · 2025-06-17T20:43:54Z

Awesome! Thanks for raising the PR. This will be a great improvement. I'll start actively reviewing this PR from tomorrow.

Can you also please update your PR in details like how did you test for single tenancy and also for multi tenancy?

codecov · 2025-06-17T21:46:09Z

Codecov Report

Attention: Patch coverage is 81.81818% with 12 lines in your changes missing coverage. Please review.

Project coverage is 80.40%. Comparing base (969dc3d) to head (80ad33a).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
.../opensearch/ml/common/connector/HttpConnector.java	75.00%	3 Missing and 1 partial ⚠️
...g/opensearch/ml/common/connector/McpConnector.java	75.00%	3 Missing and 1 partial ⚠️
.../opensearch/ml/engine/encryptor/EncryptorImpl.java	88.23%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##               main    #3919   +/-   ##
=========================================
  Coverage     80.39%   80.40%           
- Complexity     7910     7915    +5     
=========================================
  Files           693      693           
  Lines         34849    34863   +14     
  Branches       3872     3877    +5     
=========================================
+ Hits          28018    28030   +12     
  Misses         5096     5096           
- Partials       1735     1737    +2

Flag	Coverage Δ
ml-commons	`80.40% <81.81%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mingshl · 2025-06-17T22:37:29Z

common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java

+                if (decrypted.get(key) != null) {
+                    decryptedCredential.put(key, decrypted.get(key).get());
+                } else {
+                    decryptedCredential.put(key, null);


when the key is null, and you put value of null, what is the intention here? why don't you skip when the key is null?

+1 why we are assigning null here?

+1 why not use putIfAbsent()?

mingshl · 2025-06-17T22:38:51Z

common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java

+                    decryptedCredential.put(key, null);
+                }
+            } catch (InterruptedException | ExecutionException e) {
+                throw new RuntimeException(e);


please add meaningful exception message to indicate what goes wrong here. something like, failed to process fill Credentials.

Zhangxunmt · 2025-06-18T00:07:39Z

ml-algorithms/src/main/java/org/opensearch/ml/engine/encryptor/EncryptorImpl.java

-        final CryptoResult<byte[], JceMasterKey> encryptResult = crypto
-            .encryptData(jceMasterKey, plainText.getBytes(StandardCharsets.UTF_8));
-        return Base64.getEncoder().encodeToString(encryptResult.getResult());
+    public Future<String> encrypt(String plainText, String tenantId) {


why not use try catch for better error handling?
return CompletableFuture.supplyAsync(() -> {
try {
if (plainText == null || tenantId == null) {
throw new IllegalArgumentException("plainText and tenantId cannot be null");
}

initMasterKey(tenantId); final AwsCrypto crypto = createAwsCrypto(); JceMasterKey jceMasterKey = createJceMasterKey(tenantId); final CryptoResult<byte[], JceMasterKey> encryptResult = crypto .encryptData(jceMasterKey, plainText.getBytes(StandardCharsets.UTF_8)); return Base64.getEncoder().encodeToString(encryptResult.getResult()); } catch (Exception e) { throw new CompletionException("Encryption failed", e); }

Zhangxunmt · 2025-06-18T00:09:55Z

common/src/main/java/org/opensearch/ml/common/connector/McpConnector.java

        this.decryptedHeaders = createDecryptedHeaders(headers);
    }

+    private void fillCredential(Map<String, Future<String>> decrypted, Map<String, String> decryptedCredential) {


Input validation?
if (decrypted == null || decryptedCredential == null) {
throw new IllegalArgumentException("Input maps cannot be null");
}

Better yet since we're creating a new map for decryptedCredentials we don't care what it is if we immediately overwrite it here.

akolarkunnu · 2025-06-18T15:59:14Z

There are some issues with these changes, Mainly in the initializeNewMasterKey(). I am looking to those.

dbwiddis · 2025-06-19T19:18:02Z

common/src/main/java/org/opensearch/ml/common/connector/Connector.java

@@ -79,9 +80,9 @@ public interface Connector extends ToXContentObject, Writeable {

    <T> T createPayload(String action, Map<String, String> parameters);

-    void decrypt(String action, BiFunction<String, String, String> function, String tenantId);
+    void decrypt(String action, BiFunction<String, String, Future<String>> function, String tenantId);


Consider an ActionFuture here. It has better handling of OpenSearch-specific thread pools, exceptions, and task APIs.

dbwiddis · 2025-06-19T21:19:38Z

common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java

+        decryptedCredential = new HashMap<>();
        for (String key : credential.keySet()) {
-            decrypted.put(key, function.apply(credential.get(key), tenantId));
+            decryptingTempCredential.put(key, function.apply(credential.get(key), tenantId));
        }
-        this.decryptedCredential = decrypted;
+        fillCredential(decryptingTempCredential, decryptedCredential);


This section of code is very confusing. It took me a good 10 minutes to figure out that we're updating a superclass field. Suggestions to improve readability:

keep this. or maybe even super. prefix to make it clear this isn't a local variable

it makes llittle sense to create an empty map (in a superclass field) and do nothing with it but pass it as an argument. If you just called fillCredential(tempMap, this.(en|de)cryptedCredential) and created the empty map as the first step in that method it'd be more readable.

Similar comments apply to both en/de and Http/Mcp connectors.

dbwiddis · 2025-06-19T21:20:16Z

common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java

+                if (decrypted.get(key) != null) {
+                    decryptedCredential.put(key, decrypted.get(key).get());
+                } else {
+                    decryptedCredential.put(key, null);


+1 why not use putIfAbsent()?

dbwiddis · 2025-06-19T21:22:59Z

common/src/main/java/org/opensearch/ml/common/connector/HttpConnector.java

+                    decryptedCredential.put(key, null);
+                }
+            } catch (InterruptedException | ExecutionException e) {
+                throw new RuntimeException(e);


If you switch from Future to ActionFuture some of this exception handling is already included.

dbwiddis · 2025-06-19T21:25:15Z

common/src/main/java/org/opensearch/ml/common/connector/McpConnector.java

        this.decryptedHeaders = createDecryptedHeaders(headers);
    }

+    private void fillCredential(Map<String, Future<String>> decrypted, Map<String, String> decryptedCredential) {


Better yet since we're creating a new map for decryptedCredentials we don't care what it is if we immediately overwrite it here.

dbwiddis · 2025-06-19T21:28:59Z

common/src/test/java/org/opensearch/ml/common/connector/AwsConnectorTest.java

+        encryptFunction = (s, v) -> CompletableFuture.supplyAsync(() -> "encrypted: " + s.toLowerCase(Locale.ROOT));
+        decryptFunction = (s, v) -> CompletableFuture.supplyAsync(() -> "decrypted: " + s.toUpperCase(Locale.ROOT));


You're not including an Executor argument here which means the async part will execute on the ForkJoinPool.commonPool(). This can't be shut down and in theory could trigger spurious thread leak detection.

You should be using a thread pool here. Example.

Similar comment in HttpConnectorTest and AbstractConnectorTest.

dbwiddis · 2025-06-19T23:17:05Z

ml-algorithms/src/main/java/org/opensearch/ml/engine/encryptor/EncryptorImpl.java

+        return CompletableFuture.supplyAsync(() -> {
+            initMasterKey(tenantId);
+            final AwsCrypto crypto = AwsCrypto.builder().withCommitmentPolicy(CommitmentPolicy.RequireEncryptRequireDecrypt).build();
+            JceMasterKey jceMasterKey = createJceMasterKey(tenantId);
+
+            final CryptoResult<byte[], JceMasterKey> encryptResult = crypto
+                .encryptData(jceMasterKey, plainText.getBytes(StandardCharsets.UTF_8));
+            return Base64.getEncoder().encodeToString(encryptResult.getResult());
+        });


Here is where using ForkJoinPool.commonPool() is actually bad and you really do need a thread pool.

dbwiddis

Hey @akolarkunnu, first I want to thank you for your contribution! I've experienced flaky test behavior from this particular class (see #2888) and this may help with that.

That said, I've made multiple implementations on OpenSearch plugins and our Remote Metadata SDK using futures and learned some hard lessons along the way. I did leave a line by line review above but wanted to follow up with some general comments.

TLDR on using Futures... avoid them if you can implement something with an ActionListener. These are longstanding well-established and tested callback mechanisms that handle most asynchronous work in OpenSearch. When you have a single "thread" with async breaks they are almost always the correct way to handle it. When you are awaiting multiple things to happen, that's where it gets complex. That may be the case here.
OpenSearch has an ActionFuture class that doubles as an ActionListener. It has an actionGet() method that implements some exception handling (unwrapping the nested exceptions, etc.) that is pretty useful. Please take a look at using that.
Generally speaking I see the code replacing an entire map with an entirely new map every time an encrypt/decrypt call is made. It's hard for me to think this approach is thread safe without synchronization of the map. It seems to me we can probably easily add to a map without touching the other keys but the existing "replace the map" implementation raises a lot of questions.
Consider an approach using an AtomicReference to the decryptedCredential map, taking advantage of the updateAndGet() method. (I'm not sure that will work here, but it seems a possible improvement over iterate-all-and-replace-without-atomicity.

jngz-es · 2025-06-23T19:21:19Z

TLDR on using Futures... avoid them if you can implement something with an ActionListener.

I couldn't agree more on this point.

akolarkunnu requested review from b4sjoo, dhrubo-os, mingshl, jngz-es, model-collapse, rbhavna, ylwu-amzn, zane-neo, Zhangxunmt, austintlee, HenryL27 and xinyual as code owners June 17, 2025 12:22

akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval June 17, 2025 12:23 — with GitHub Actions Inactive

akolarkunnu temporarily deployed to ml-commons-cicd-env-require-approval June 17, 2025 21:46 — with GitHub Actions Inactive

mingshl reviewed Jun 17, 2025

View reviewed changes

Zhangxunmt reviewed Jun 18, 2025

View reviewed changes

Merge branch 'main' into EncryptorRefactoring

5031849

dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval June 18, 2025 15:19 — with GitHub Actions Waiting

akolarkunnu closed this Jun 19, 2025

akolarkunnu reopened this Jun 19, 2025

akolarkunnu requested a deployment to ml-commons-cicd-env-require-approval June 19, 2025 14:53 — with GitHub Actions Waiting

Merge branch 'main' into EncryptorRefactoring

d0ce2b3

dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval June 19, 2025 16:00 — with GitHub Actions Waiting

dbwiddis reviewed Jun 19, 2025

View reviewed changes

dbwiddis reviewed Jun 20, 2025

View reviewed changes

Merge branch 'main' into EncryptorRefactoring

5723dd7

dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval June 20, 2025 13:55 — with GitHub Actions Waiting

Merge branch 'main' into EncryptorRefactoring

189cee3

dhrubo-os requested a deployment to ml-commons-cicd-env-require-approval June 23, 2025 19:26 — with GitHub Actions Waiting

		encryptFunction = (s, v) -> CompletableFuture.supplyAsync(() -> "encrypted: " + s.toLowerCase(Locale.ROOT));
		decryptFunction = (s, v) -> CompletableFuture.supplyAsync(() -> "decrypted: " + s.toUpperCase(Locale.ROOT));

[FEATURE] Improve EncryptorImpl with Asynchronous Handling for Scalability #3919

Are you sure you want to change the base?

[FEATURE] Improve EncryptorImpl with Asynchronous Handling for Scalability #3919

Conversation

akolarkunnu commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

dhrubo-os commented Jun 17, 2025

Uh oh!

codecov bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

akolarkunnu commented Jun 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbwiddis Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dbwiddis left a comment

Choose a reason for hiding this comment

Uh oh!

jngz-es commented Jun 23, 2025

Uh oh!

Uh oh!

akolarkunnu commented Jun 17, 2025 •

edited

Loading

codecov bot commented Jun 17, 2025 •

edited

Loading

dbwiddis Jun 19, 2025 •

edited

Loading